Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit

نویسندگان

  • Jan Kratt
  • Florian Metze
  • Rainer Stiefelhagen
  • Alexander H. Waibel
چکیده

This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Improving lip-reading performance for robust audiovisual speech recognition using DNNs

This paper presents preliminary experiments using the Kaldi toolkit [1] to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kal...

متن کامل

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

This paper describes our recent e ort in developing the GlobalPhone database for multilingual large vocabulary continuous speech recognition. In particular we present the current status of the GlobalPhone corpus containing high quality speech data for the 9 languages Arabic, Chinese, Croatic, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. We also discuss the JANUS-3 toolkit and ho...

متن کامل

The Karlsruhe-Verbmobil speech recognition engine

Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development of a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a yearly basis during the project and shows very prom...

متن کامل

Large-vocabulary audio-visual speech recognition by machines and humans

We compare automatic recognition with human perception of audio-visual speech, in the large-vocabulary, continuous speech recognition (LVCSR) domain. Specifically, we study the benefit of the visual modality for both machines and humans, when combined with audio degraded by speech-babble noise at various signal-to-noise ratios (SNRs). We first consider an automatic speechreading system with a p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004